Using DMA

Using DMA

DMA access achieves higher throughput than PIO when the device transfers more than a few words of data at a time. DMA is typically set up by programming device registers with the target address and length, and leaving the device to generate a series of stores or loads from memory. The details of device control are hardware-dependent.

The direction of a DMA transfer is measured with respect to the device, which operates independently. A DMA operation is either a DMA read (of memory data out to the device) or a DMA write (by the device, of data into memory).

DMA buffers should be cache-aligned in memory (see "Setting Up a DMA Transfer"). Prior to a DMA read, the driver should make sure that cached data has been written to memory using dki_cache_wb(). Prior to a DMA write, the driver should make sure the CPU knows that cached data is invalid (or is about to become invalid) using dki_cache_inval() (see "Managing Memory for Cache Coherency").

DMA To Multiple Pages

Some devices can perform DMA only in a single transfer of data to a range of contiguous addresses. Such a device must be programmed separately for each individual page of data. Other devices are capable of transferring a series of page units to different addresses; that is, they support "scatter/gather" capability. These devices can be programmed once to transfer an entire buffer of data, regardless of whether the buffer spans multiple pages.

In either case, the pfxstrategy() entry point of a block device driver must calculate the physical addresses of a series of one or more pages, and program them into the device. When the device does not support scatter/gather, it is set up and started on each page of data individually, with an interrupt after each page. When the device supports scatter/gather, it is programmed with a list of page addresses all at once.

DMA With Scatter/Gather Capability

Example 18-3 shows the skeleton of a pfxstrategy() entry point for a block device driver for a hypothetical GIO device that supports scatter/gather capability.

Example 18-3 : Strategy Code for Hypothetical Scatter/Gather GIO Device

/* Actual device setup for DMA, etc., if your board has
 * hardware scatter/gather DMA support.
 * Called from the hypo_write() routine via physio().
 */
void
hypo_strategy(struct buf *bp)
{
   int unit = geteminor(bp->b_dev)&1;
   int npages;
   volatile unsigned *sgregisters; /* ->device regs */
   int i, v_addr;
   /* MISSING: any checking for initial state. */
   /* Get address of the scatter/gather registers */
   sgregisters = gbd_device[unit]->sgregisters;
   /* Get the kernel virtual address of the data; note
    * b_dmaaddr may be NULL if the BP_ISMAPPED(bp) macro
    * indicates false; in that case, the field bp->b_pages
    * is a pointer to a linked list of pfdat structure
    * pointers; that saves creating a virtual mapping and
    * then decoding that mapping back to physical addresses.
    * BP_ISMAPPED will never be false for character devices,
    * only block devices.
    */
   if(!BP_ISMAPPED(bp)) {
      cmn_err(CE_WARN,
         "gbd driver can't handle unmapped buffers");
      bioerror(bp, EIO);
      biodone(bp);
      return;
   }
v_addr = bp->b_dmaaddr;
   /* Compute number of pages affected by this request.
    * The numpages() macro (sysmacros.h) returns the number of pages
    * that span a given length starting at a given address, allowing
    * for partial pages.  Unrealistically, we limit this to the
    * number of scatter/gather registers on board.
    * Note that this sample driver doesn't handle the
    * case of requests > than # of registers!
    */
   npages = numpages (v_addr, bp->b_bcount);
   if(npages > GBD_NUM_DMA_PGS) {
       bp->b_resid = IO_NBPP * (npages - GBD_NUM_DMA_PGS);
       npages = GBD_NUM_DMA_PGS;
       cmn_err(CE_WARN,
           "request too large, only %d pages max", npages);
   }
   /* Translate the virtual address of each page to a
    * physical page number and load it into the next
    * scatter/gather register.
    * btop() converts the byte value to a page value after
    * rounding down the byte value to a full page.
    */
   for (i = 0; i < npages; i++) {
      *sgregisters++ = btop(kvtophys(v_addr));
      v_addr += IO_NBPP;
   }
   /* Program the device for input or output */
   if ((bp->b_flags & B_READ) == 0)
      gbd_device[unit]->direction = GBD_WRITE;
   else
      gbd_device[unit]->direction = GBD_READ;
/* Start the device going and return. The caller, either a
 * file system or uiophysio(), waits for the iodone() call
 * from the interrupt routine.
 */
   gbd_device[unit]->command = GBD_GO;
}

DMA Without Scatter/Gather Support

When the GIO device does not provide scatter/gather capability, the driver must program the transfer of each memory page individually, ensuring that the device does not attempt to store or load across a page boundary. The usual method is as follows:

In the pfxstrategy() routine, save the address of the buf_t for use by the pfxintr() entry point.
In the pfxstrategy() routine, program the device to transfer the data for the first page, and start the device going.
In the pfxintr() entry point, calculate the number of bytes remaining to transfer. If the count is zero, signal biodone(). If the count is nonzero, program the device to transfer the next page of data.

Under this design, there is no explicit loop over the successive pages of the transfer visible in the code. The loop is implicit in the fact that the pfxintr() entry point starts a new transfer, and so will be called again, until the transfer is complete.

Example 18-4 shows the code of the pfxstrategy() routine for a hypothetical GIO device without scatter/gather.

Example 18-4 : Strategy() Code for GIO Device Without Scatter/Gather

/* Actual device setup for DMA, etc., when the board
 * does NOT have hardware scatter/gather DMA support.
 * Called from the hypo_write() routine via physio().
 */
void
hypo_strategy(struct buf *bp)
{
   int unit = geteminor(bp->b_dev)&1;
   /* MISSING: any checking for initial state. */
   /* Get the kernel virtual address of the data; note
   * b_dmaaddr may be NULL if the BP_ISMAPPED(bp) macro
   * indicates false; in that case, the field bp->b_pages
   * is a pointer to a linked list of pfdat structure
   * pointers; that saves creating a virtual mapping and
   * then decoding that mapping back to physical addresses.
   * BP_ISMAPPED will never be false for character devices,
   * only block devices.
   */
   if(!BP_ISMAPPED(bp)) {
      cmn_err(CE_WARN,
         "gbd driver can't handle unmapped buffers");
      bioerror(bp, EIO);
      biodone(bp);
      return;
   }
   /* Save ->buf_t where interrupt handler can find it */
   gbd_curbp[unit] = bp; 
   /*
   * Initialize the current transfer address and count.
   * The first transfer should finish the rest of the
   * page, but do no more than the total byte count.
   */
   gbd_curaddr[unit] = bp->b_dmaaddr;
   gbd_totcount[unit] = bp->b_count;
   gbd_curcount[unit] = IO_NBPP-
      ((unsigned int)gbd_curaddr[unit] & (IO_NBPP-1));
   if (bp->b_count < gbd_curcount[unit])
      gbd_curcount[unit] = bp->b_count;
   /* Tell the device starting physical address, count,
   * and direction */
   gbd_device[unit]->startaddr = kvtophys(gbd_curaddr[unit]);
   gbd_device[unit]->count = gbd_curcount[unit];
   if (bp->b_flags & B_READ) == 0)
      gbd_device[unit]->direction = GBD_WRITE;
   else
      gbd_device[unit]->direction = GBD_READ;
   gbd_device[unit]->command = GBD_GO;   /* start DMA */
   /* and return; upper layers of kernel wait for iodone(bp) */
}

An alternate design might seem conceptually simpler: to put an explicit loop in the pfxstrategy() routine, starting each page transfer and waiting on a semaphore until the pfxintr() routine is called. Such a design keeps the complexity in the pfxstrategy() routine, making the pfxintr() routine as simple as possible. However, it has a high cost in performance because the pfxstrategy routine must wake up and be dispatched for every page.

Scatter/gather programming can be simplified by the use of the sgset() function, which calculates the physical addresses and lengths for each page in the transfer (see the sgset(D3) reference page). The sgset() function is limited to use with hardware that uses a fixed mapping of bus addresses to memory addresses, which is the case in the workstations supporting GIO. For example, sgset() cannot be used in the Challenge or Onyx line; it always returns -1 in those systems.